*************************Canen and Song (2020) - CPS data for empirical application*************************
*Data is downloaded from IPUMS-CPS, please reference them appropriately - see: https://cps.ipums.org/cps/citation.shtml

*This file loads the raw CPS data and outputs the data used in the empirical application

clear
use cps_00004_raw.dta

*Romano and Shaikh (2010) restrictions
keep if age>=20 & age<=24
keep if race==100
keep if srcearn==1
keep if uhrsworkly>=2

label val sex

*Recode sex (2 = Female, 1 = Male) to dummy variable = 1 if Male, 0 if Female 
*(Romano and Shaikh do the opposite, but this is more convenient for our log() specification as Male have higher salaries on average, and delta>0)
recode sex (2 = 0)

*Generate Treatment variable
gen college_grad = 1 if educ>=92
replace college_grad = 0 if educ<92

*Summary statistics described in Appendix
tab educ
tab college_grad

*Keep only Data used in the Empirical Specification
preserve
keep sex uhrsworkly incwage college_grad
export delim empirical_CPS_20Dec2020_short, replace
restore